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ABSTRACT 

Advanced learners of second languages and natural 

language processing systems both dem^ more detailed le&ical 

information than conventional dictionaries provide. Text composition , 
whether by ^ humans or machines , requires a thorough understanding of 
relationships between words , such a restrictions, case 

patterns^ factivc»s^ and other kinds of verb impiicature. For verbs, 
we need to know whether they are action or stative, performative or 
hot, and what kinds of complements they take. It is important to know 
whether an adjective is non-predicating, non-attributive^ action, or 
stative. For nouns, weneed relations like taxonomy, part-whole, 
iherobei^ship, and modification, and also attributes lik^ mass, 
human, and animate. This paper discusses these and other kinds of 
lexical information found only implicitly, if at ail, in most 
commercial dictionaries. (MSZ) 
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Advanced 2earher8_ of second languages and 

natural language prbceeslhg systems both demand 

much more detailed li^xlcal ^Information than 
conventlphal dictionaries can provide. Text 

cdmppsltldr^, whether by humans or machines^ 

requires a thorough understanding of relationships 

between words, such as selectlonal restrictions, 

case patterns, f actives and other kinds of v«?rb 

Impllcature^ For verbs we heed to know whether 

they are action or statiye^ performative or not, 

and what kinds of complements they take. It Is 

Important to know whether an adjective Is non- 

predlcatlhg, non-attributive^ action, or statlve. 
For nouns we need relations like taxonomy, part- 
whole^ membership, and modification, and also 

attributes like count, mass, human, and animate. 

This paper discusser these and other kinds of 

lexical Information found only implicitly r if at 
all, in most commercial dictionaries. 



INTRODUCTION 

Advanced learners of second languages and natural 

language processing systems both need much more detailed 

lexical information than conventional dictionaries can 

provide. Native speakers say 'doctor of medicine' but 

'specialist in orthopedlcs^J^ even if they have to look up 

orthopedics to discover the spelling or meaning. 

Cbmplementlzers are especially confusing: wish and want are 
much alike, but we say *I wish (that) he would go,' but 'I 
want him to go,^ not ^I want that he would go.' Most 
conventional dictionaries, even those that explain subtle 

dls t inc t ions of meanings in a sophisticated vocabulary 

assume that their users know to combine the simple words. 

Natural language understanding and generation programs 

require even more detailed lex^ information and are less 
well-equipped to learn from examples. it is the designers of 
dictionaries for advanced learners that have led the way in 
categorizing the kind of information that is heeded and in 
trying to obtain and organize this informatidh. 
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DICTIONARIES FOR ADVANCED LEARNERS 

The first to propose a design for a radically hew type 
of dictionary were the Sovlet_ linguists Apresyaa^ Hel'culc^ 
and Zholkovsky (1970). They proposed an Explanatory- 
Combinatory Dictionary that would explain the morphology of 

the word and Its government patterns* describe the lexical 

universe of the entry word, and the way it combines with 
other words into phrases. The description of the lexical 

universe places a term in its semantic field and 

discriminates between syhbhyms and hear syhbhyms* The most 
distinctive and original feature of their proposal was the 
the list of ^lexical functions. • These functions include the 
classical relatidhs of synonymy and taxonomy as well as abotzt 
fifty others, such as: 

son - typical sound Son ( cat ) » meow 

Liqu - destrbyihg verb LlqxaJ^lstafcf ) « t o correct 

Prepar - ready for use Prepar I table ) «• to jay 

Inc - increase verb XhcC tehsloh ) « to mount 

Dec - decrease verb Dec (cloth) « to shrink 

Hel'cuk has published fifty sample entries for French (1984) 
and a much more complete dictionary of Russian. 

Three very ihterestihg dictionaries have been published 

for advanced learners of English the Oxford Advanced 

Learners. Dictionary^ edited by Hornby (1974)^ the Collins 

English Learner's Dictionary (Carver » 1974)^ and the Longman 

Pictidhary of Contemporary English (Procter, 1978) • All 

three cohtaih detailed information about selectional 

restrictions, sentential complements^ and semantic fields. 
The Longman Dictionary has a controlled vocabulary of 2,000 
words and comes in ah American version. 

- Although hone of these dictionaries contains all the 
features described by Mel'cuk^ they provide advanced learners 
with ihfbrmatioh hot available In other English dictionaries. 
With great vision the publishers of these dictionaries have 
made them available In machine readable form for research in 
lexicography ahd natural language processing. The Longman 
tape cohtains further information too bulky to put in the 
printed book. 

It is clear that lexical knowledge involves not only 
words but phrases^ Becker (1975) argues that people generate 

text by ©ticking together large swatches of preformed 

phrases, some only two or three words in length ('by rid 
means'), some a v^ole sentence (*I am so glad to see ydti 
againj^). Table I summarizes Becker's classification of 
phrasal information needed in the lexicon. If natural 
language processing systems are to create text that sounds 
natural, they have to have phrasal lexicons. 

If you itake a strong lexicalist positiph^ that is^ if 
you believe that much of our linguistic knowledge is stored 
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In the lexicon, then the range of what Is considered to be 

lexical Information expands to include case arguments for 

verbs ^ generic fillers of functional relation slots like 

8ub:Ject and object^ and triggers for syntactic x^ules like 
dative shift (as in ^Mary gave the bail to me[ vs. 'Mary gave 
me the ball*)^ Also included are selectlonal restrictions, 
collbcatibns, and lexicai-sema^ relations such as taxonomy 

and part-v^ole. Many of these new types of Information are 

as important to computers as to second language learners — 

along with traditiorai lexical information like etymology, 
morphology^ and phonology (all being used by programs that 
read text aloud [Church, i986j) . Furthermore, human lexical 
knowledge involves not only isolated words and phrases but 
whole networks of related words. The easiest arid most 

natural way to express this kind of semantic information 

about the words and phrases in the lexicon is to make 
extensive use of the lexicalf unctions proposed by Apresyah, 
Zhoikovsky, and Mel 'cuk (19701 and of other lexical semantic 
relations (Evens, Litowitz, Harkowitz, Smith, and Werner^ 
1980; Evens and Smith, 1978). 

1. Polywords to blow up 

2. Phrasal Constraints by pure (sheer) coincidence 

3. Deictic Locutions for that matter 

4. Sentence Builders X gave y_a song arid dance 

about S 

8. Situational Utterances 

you are very welcome! 
_ How can I ever repay ydu?_ 

6. Verbatim Texts When I consider how my life 

is spent 

Table 1. Categories from Becker's phrasal lexicon. 

To build a large lexical databeise by hand would require 
the resources available to the publisher of a commercial 
dictionary. The only possible strategy is to extract as much 
Infprmatidn as possible from a machine readable dictionary. 
While several British dictionary publishers have made 
dictionary tapes available for_ research and other tape 
sources are available from the Oxford Archive, there is only 
one American dlctidharyava to researchers in machine 

readable form: Webster 's Seventh Collegiate Bictibharv |W7h 
John Olney^ who produced the original W7 tapes, described his 
reasons for chposihg to transcribe W7 instead of another 
American dictibhary (1968). He was very favorably impressed 
by the_ large quantity of citations collected by the staff at 
the G&C Herriam Company and their systematic analyses of 
these citations. 

_ W7 is ah excellent soured for lexical information. Some 

of that information^ such as part of speech, is stated 

explicitly in each lexical entry, but even more information^ 

particularly information about lexical-semantic 

relationships, such as taxonomic relationships and t^npical 
object of verbs is expressed implicitly and^ therefore, must 
be extracted from definitions. Given the quantity of data 
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available to U8 In W7 and bur goal of building a large 
lexical database, we decided to try to extract as much as 
possible aUtbinat leal ly. This decision Implied that we had to 
parse the definitions. 

After much discussion of possible parsers we chose to 
use Sager's (1981) Elhgulstlc String Parser (tSP) from the 

Cburant Institute at New York University^ Although the 

theoretical framework on which this parser Is based Is 

somewhat but of fashion, the parser Is an elegant^ modern 

piece of sbf tware , which has been used to parse a large 

number of scientific papers^ Sager and Srlshman encotxrage 

bthers to use the ESP and make available a set of well- 
written manuals. The LSP has a large and sophisticated 

grammar, a ten thousand word lexicon^ and excellent 

facilities for adding rules to the grammar and for expanding 

the lexicbh. We have used the LSP to parse thousands of W7 
defihitibh texts and have found the ESP to be a valuable tool 
fbr dictionary research as vrell as for other natural language 
prbcessihg projects^ He would be glad to give copies of our 

grammar fbr W7 definitions (and the ESP Mandarin gr?»mmar^ 

which we have created for experiments in parsing and text 
generation) to anyone interested. 

In the remaining sections of th discuss 
bur cbncept of a lexical database and describe our attempts 
tb extract some of this important lexical information from W7 
using Sager 's ESP. 



EEXiebNS FOR NATURAE EANGUAGE PROCESSING 

Most existing natural language processing systems attack 
very specialized problems using handmade lexicons containing 
only a few hundred words^ Before natural language processing 
systems can expand to understand input from wider domains, 

they need^ much larger leKlcons containing precise and 

detailed syntactic and semantic information. Text generation 
systems require even more knowledge than natural language 
understanding systems. 

We have set out to build a large relational lexicon for 

natural language processing applicatidns containing as much 
detailed syntactic and semantic, informatlQh as possible 
(Ahl swede. Evens, Markowitz, and Rossi, 1986). Whenever it 
is feasible, we have extracted informatioh automatically from 
W7. We began by constructing an interactive lexicon builder 
(Ahlswede, 19B5b) for use when we could not find the 
information we needed in machine readable fbrmf or when 
fur Cher human input was re(|ulred tb classify entries 
properly. The interactive lexicon builder includes routines 
that add an entry, 3dit existing entries, give a list of all 
the relations beting used in the lexicon with exampler^ keep 
track of words that have been used in other entries, but are 
mot yet defined themselves, etc. 
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ftii entries contain relational Informs regardless 

of the part of speech of the headword, the other information 
Inciaded depends heavily on the part of speech. Verb entries 
are the most extensive; they contain case infprmatidh 
combined with selectional restrictions, tell whether the verb 
is active or statiye, whether it can be put in the passive 
voice or not. If the verb is a performative, then the 
performative class is given. If it can take sentential 
complements, then the complementizers are listed, along with 
information about implicature, and whether theverb supports 
not- transportation, Noun entries list plural forms^ 
factivity, and attributes such as animate, human, cdhcfetej 
count vs. mass, For adjectives we include selectidhal 
information, action vs. statiye status^ If the adjective 
cannot appear in predicate position or_ attributive pbsitibh 
that fact is noted. Special classes of adjectives are marked 
as being ordinal or cardinal, as well as for color, oize^ 
time, etc. Ne_ are still trying to figure out advierb 
categories, aside from the obvious time^ duration^ position, 
aianner , cause , etc . 



RELATIONS IN THE LEXICON 

_ Lexical ^semantic relations express relationships between 
words and concepts in the dictionary. They include Mel'cuk's 
lexical fuhctibhs as well as case relations like agent, 
patient, instrument collocational relations, ^Ich identify 

words that go together like bread and butter ^ concrete 

jL^elatibhs such as part-wholes, and made-out-of ^ and various 

types of grading relations, |as expressed in Monday - Tuesday - 
Wednesday and hbt - warm - cbb j - cold ) . Synonymy and antonymy are 
the bhly relations expressed overtly in W7^ therefore we have 
haid to search for hidden expressions of other relations. 

Our greatest success has come from recurring word 

patterns that signal specific relationships. These patterns 

are often called J^deflnlng formulae. Defining formulee 

consist of one or more specific words in a rigid pattern; 

sometimes they also involve special punctuation like 

parentheses |Smith, i985|^ Table 2 shows a few of the 

defining formulae that appear in W7 along with the relations 
that they identify^ The formula "Any" + NP consistently 

signals a taxonomic relationship between the noun being 

defined and the head noun of the NP^ The similar pattern 

"Any of a" + NP usually marks a biological taxonomy the 

scientific name of the taxonomic superqrdinate given in 

parentheses. The formula "to make" -9- Adj clearly expresses a 
causative^ The formula "To" + VP + ("as" NP) names the 

typical object of the verb being defined inside the 

parentheses. More details about defining formulae for nouns 
in W? can be found in Narkowitz, Ahlswede, and Evens (1986) 
and Amsier (1980). 

.Defining formulae often tell us c.bout attributes too. 

Noun attributes include o^^unt vs. mass, concrete vs. 
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abstracts human V8^ animate vs^ Inanimate, and gender. The 

formula "ft member of" +1^ tells us about the eiemeht-set 

relation and also signals that the noun being defined is 

human. The formula "One who" + VP also signals a human, noun » 
while, at the same time^ giving us the generic agent for the 
verb^ We hoped that the formula "One that" + VP would signal 
a non-human noun, but that turned out not to be true. Most, 
but not ail, of the nouns defined in this way are human. 



Formula 



Relation 



Examples 



"any" + NP taxonomy 
"any of a" 



"young" 



child 
cause 



"to make" 

+ ftdj 
"to" + V generic 

+ ("as" N) object 



"one who" generic 
"one that" agent 



nectar: any delicious drink 
capuchin : any of a genus ( cebus ) 

of South American monkeys 
puppy: *a young dog 
lamb: a young sheep 
heat : to make warm or hot 
redden: to make red or reddish 
mount : to put or have ( as 

artillery) in position 
lay: to bring forth and deposit 

(an egg) 
ghost: One who ghost-writes 
instructor: one that instructs 



Table 2. Defining formulae from W7. 



VERB CLASSES 

The stative/action distinctldn is important in the 
generation of dialog. Stative verbs characterize states of 
being like owning , being , and resembllna , while action verbs 
name acts like moving , thinking, and doing. Not 
surprisingly, most verbs fall into the action class and are 
characterized by their ability: 

1. to appear in Imperative form (e.g., 'Hove! Bite 
that dpgi * but not * Resemble your mother!' and 
* Own the house ! • ) 

2. to take the progressive aspect (e.g. ^ 'He is 
moving^ he is biting the dbg^ * but hot 'She is 
resembling her mother,') 

3. to_ serve in sentential complements of verbs of 
ordering ie.g. ^ *I told her to bite the dbg^ ' 
but not *I told her to resemble her mother.') 

The best clue we have found for identifying actibh vc?re<5 in 
W7 is to look at the defihit ions of nouns derived from verbrs^ 
Those that aros defined as "the act of <x>ihg," where x is a 

verb, are typically actibh ve^-bs. We have taken chis rbute 

because we have been unable to extract cbhslstent formulae 
directly frbm the verb defihitlbns and the verb entries in W7 
db hbt tell us which verbs normally are used in Imperative or 
in progressive forms. Unfortunately, the formula, "the 
quality or state of <x>lng^" is not a reliable signal for 
stative verbs (e.g., "condensation: the quality or state of 
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"to make" cause heat: to make warm or hot 

_+ Adj redden: to make red or ^eddish 

"to" + y generic, mount: to put or have (as 
+ ("as" N) object artillery j In position 

lay: to bring forth and deposit 
(an egg) 

generic ghost: one who ghost-writes 
agent instructor: one that instructs 



"one who" 
"one that" 



Table 2. Defining formulae from W7« 



VERB CLASSES 

The stative/action distinctldn is Important in the 
generation of dialog. Stative verbs characterize states of 
being like owning > being , and resembling , while action verbs 
name acts like moving . thinking, and doing. Not 
surprisingly, most verbs fall into the action class and are 
characterized by their ability: 

1. to appear in imperative form (e.g., 'Hove! Bite 
that dog! ' but not 'Resemble your mother!' and 
' Own the house ! ' ) 

2. to take the progressive aspect (e.jgr«f 'He is 
moving 1 he Is biting the dbg, ' but not 'She is 
resembling her mother.') 

3. to serve in sentential complements of verbs of 
order ing ie.g. «^ 'I told her to bite the dbg^ ' 
but not 'I told her to resemble her mother.') 

The best clue we have found for identifying action vere«5 In 
W7 is to look at the defihitibhs of nouns derived from verbr^^ 
Those that ars* defined as "the act of <x>lhg," where x is a 

verb, aro typically action ve^-bs. We have takeh^chls route 

because we have been unable to extract consistent formulae 
directly from the verb deflhltlbzis and the verb entries in W7 
do hot tell iw which verbs hbriaally are used Ih imperative or 
in progressive forms. Unfortunately, the formula^ "the 
quality or state of <x>lng^" Is not a reliable signal for 
stative verbs (e.g., "condensation: the quality or state of 
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supports not- transportation If not , never . ah(} other adverbs 
of __negatlon can be cipved frpm the cbmpleaieht clause to the 
main clause without making a significant alteration In the 
meaning^ The verb want supports not-transpbrtatlbh: aid 
not want to go' and 'I wanted not to go' have essentially the 
same meaning. The verb promlee , on the other hahd^ does not 
display thls_ attribute; 'I did not promise to go' and 'I 
promised not to go' have very different meanings. 

Some verbs that take sentential complements display 

rather complex implicit ion patterns be twe^ verb and 

the complement. Verbs like realize ^ for e;^ample. Indicate 

that the speaker prqstimes the complement to be true^ e.g.^ 

^Mary realized that she was wearing magic shoes^J^ Verbs like 
pretend , on the other hand imply that their complements are 
false, as In, ' Mary pretended that she was wearing magic 
shoes.' The Klparskys U970) gave the name fact lye to the 
class of verbs that behave like realize and pointed out that 

the presumption holds even if the main verb is negated, as 

In, 'Mary did not realize that she was wearing magic shoes. 
Jpshl and Nelschedel (1973| did a much more complete analysis 
of Impllcature relations between verbs and their cbmpiements; 
their results are siiinmarlzed In Table 5^ {Here R stands for 
the main verb, S for the sentential complement.) 

Implicattlre classes are very important for discourse 
understanding and generation because they link the discourse 
to the speaker's view of the world. To date we have not been 
**>le to find a satisfactory way of Identifying the 

impllcature class of a verb by simply using W7. We are 

trying to see if we can extract more clues from Householder's 
verb categories . 



Class Implicational 


Structure Examples 


Factive 


R(S) 




S 


Jerry realized that 




^ R(S) 


— > 


s 


Meg baked the cake. 


Implicative 


R{S) 


— > 


s 


We managed to 




^ R(S) 


— > 




finish the Job. 


Only^lf 


^ R(S) 


— > 




They allowed Jim to 










to visit China. 


If 


R(S) 


— > 


s 


tarry persuaded Bill 










to accept the job. 


Negetive-If 


R(S) 


— > 




£arry prevented Bill 


Negative 








from winning. 


R(S) 


— > 




John failed to go. 


Implicative 


^ R(S) 


— > 


s 


Gbuhter-Factlve 


R(S) 


— > 




Mary pretended that 




" R(S) 


— > 




Ben went home. 



Table 5. Classification of main verbs in gz'edidate 
complement constructions Xadapted from 
Joshl and Noischedel , 1973). 

An interesting class of v«rbs called 'porf ormatives • was 

i±rst described by Austin (3.962) as part ojE his theory of 
speech acts. Performatives are action verbs which^ when 

S 



ERIC 



9 



spoken^ nctuaJly perform ah act. Wien, for example, people 
say* •! warn yott^J^ they are simultaneously uttering some 
words and performing an act of warning. Performative verbs 

were also studl€5CJ by Vendlar H972) and then Vehdler'a 

classification was reviewed and reorganized by HcCawley 
(1979). We have actually beenuslngMcCawley's categories In 
our iaxicon and, therefore. Table 6 represents HcCawley 's 

point of yiew^ To date, we have been unable to Identify 

defining formulae for performatives, but we have achieved 
some success in classifying additional verbs by checking to 
see if the sense-level synonyms for definitions of a verb 
appear in our lists of performative verbs. 



Class 
Verdictives 

Commissives 

Behabltlves 

Exposltlves 

Operatives 



Exercltlyes 
Imperative 



Advisories 



Descriptloh 

"essentially giving a 
finding as to sdmethihg*" 
(Austin, 1962, p. 150} 

"promising of otherwise 
under taklhg" 

"have to do with attitudes 
and social behavior" 
(p. 151) 

"make plain how our utter- 
ances fit into the course 
of an argument of cbhvef- 
satidn" 

"acts by which the speaker 
makes something the case" 
XMcCawley^ 1979, p^ 163) 
HcCawley divides in two :^ 

"ah imperative act gets the 
addressee to do the thing 
in question because it is 
the speaker's desire" 

"ah advisory act gets him 
to do ft because it is 



Examples 

acquit 

diagnose 

estimnte 

pfomise 

espouse 

agree 

curse 

thank 

apologize 

concede 

illustrate 
assume 

abdicate 

appoint 

levy 

admonish 

forbid 

beg 

adyise 
exhort 



Table 6. Performative verbs r 



ADJECTIVE CATEGORIES 

We have d€?ve loped a large list of useful adjective 
felatibhs (Ahlswede, 1985a) ^ but we are still searching for 

mofe information about adjective classes and relevant 

attributes. The action/statiye distinction seems to be as 

impbftaht for adjectives as it is for verbs. There is one 

impbftaht difference, howeverj^ adjectives seem to bt^ statlve 
more often f-fiah not, while more verbs seem to belong to the 
action category. Action adjectives behave much like action 
verbs. They occur after Imperative and progressive forms of 
the verb to be . Kind is an action adjective while tall is 
stative, as the examples in H) nake clc^ar: 
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(i) Be kind! 

• Be tail! 

Sally l8 only being kind. 

* Sally Is only being tall. 

The stative-actlon parameter seems to be easier to Identify 
in W7 definitions for adjectives than It Is for verbs. The 
many adjectives defined by the formula "Of or relating to" 
seem to be statlve, e.g., " literary ; of or relating to 
books. " Adjectives defined as "Being seem to belong 

consistently to the action class, e.g., " cursed : being under 
or deserving a curse • " 

While most adjectives can appear lb both attributive and 

predicate positions, some are not hbh-predicatlng and otfae» 
are non-attributive, it Is perfectly appropriate to refer to 

our neighbor as 'ati electrical engineer^ ^ but we do not say 

"this engineer is electrical. ' The phrase^a civil engineer* 
is ambiguous, because it may refer to a person who desigm^ 
bridges or to a polite engineer. If we say^ 'The engineer is 
civil, • the ambiguity disappears; only the polite sense is 
possible. Two very common hoh-attrlbutive adjectives are 
awak§ and asleep . I can say •My class is awa/cej^ or *My class 
is asleep^^ but I cannot refer to 'my awake class' and 'my 
asleep class. ' 

Another problem for text generation programs and 
advanced learners who are trying to write down complex ideas 
in English is the rule for combining a number of adjectives 
in attributive pbsltibh. This rule seems to depend very 
markedly oh the semantic categories of the adjectives in 
guestlbh. One version of this rule (Wihograd, 1971) can be 
phrased : 

demdhstfative > ordinal > 

cardinal > general > size > color 

as in 'these first six handsome large red trucks.' In pur 

lexical database we mark adjectives according to the 

categories, ordinal, cardinal^ size, and color, along with 
time and measure^ but we are sure that we are missing mahy 
other categories anS much important selectlonal Informatldn 
for adjectives. 



eONCtUSION 

If we are we are going to do a better job of natural 
language processing^ then we need to make explicit things 
which are implicit or missing in current cdmiuercikl 
dictionaries. in thi^ paper we have only touched on a few 
types of lekiceii information that we expect will be available 

in the dictionaries of the future, ^e hope that these 

dictionaries will also serve advanced learners of second 
languages . 
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